Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-21873: increase sensitivity on alert for tenant db connections #271

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

johannes94
Copy link
Contributor

@johannes94 johannes94 commented Jul 15, 2024

I've tested this alert as part of https://issues.redhat.com/browse/ROX-21873 by blocking DB ingress via AWS security groups.

It took 35 minutes for this alert to appear in alertmanager. This is because we're working with the AVG metric in this alert and have a wait time of 20 minutes defined. Since a broken DB connection means a critical state for the entire system I think we should alert earlier on this.

Looking at the metrics for longer running tenants it seems to make sense to continue using the AVG because on central rollouts this metric might be down for some time. So instead of increasing the threshold I reduced the wait time to 10 minutes.

@johannes94 johannes94 requested a review from a team July 15, 2024 13:49
@johannes94 johannes94 requested a review from a team as a code owner July 15, 2024 13:49
@johannes94 johannes94 force-pushed the jmalsam/ROX-21873-db-connection-alert-threshold branch from f665045 to 8257d73 Compare July 15, 2024 13:50
@johannes94 johannes94 merged commit 67f7f4f into master Jul 16, 2024
2 checks passed
@johannes94 johannes94 deleted the jmalsam/ROX-21873-db-connection-alert-threshold branch July 16, 2024 05:59
johannes94 added a commit that referenced this pull request Jul 23, 2024
johannes94 added a commit that referenced this pull request Jul 23, 2024
aaa5kameric pushed a commit that referenced this pull request Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants